Take Home Exercise 01

Published

February 4, 2024

Project Overview

It is important for local and government urban planners to know and understand how, where and when people move. For Singapore, data regarding human mobility has largely been collected and disseminated by the Land Transport Authority (LTA). However, they have only collected and publicised data regarding passenger volume from trains stations and bus stops. While immensely helpful, it still leaves out crucial information about human mobility through other means of transport, making it incomphrensive.

In 2020, Grab released Grab Posisi, a data set regarding the origin and destinations of Grab taxi users in Singapore.

In this take-home exercise, I will be finding the geographical and spatio-temporal distribution of Grab hailing service locations in Singapore.

The distribution will be analysed using the following, which will be displayed on the openstreetmap of Singapore:

  • Traditional Kernel Density Estimation (KDE) layers: to analyse the where origin points tend to cluster

  • Network Kernel Density Estimation (NKDE): to analyse the clusters of origin points, as constrained by the road.

The Data

Data Sets Used
Name Description Format Source
MP14_SUBZONE_WEB_PL Subzone Boundary of Singapore (2014) .shp data.gov.sg
Grab Posisi Dataframe of Origin and Destination Locations through Grab .parquet Provided by professor
gis_osm_roads_free_1 Road network of Malaysia, Singapore and Brunei .shp geofabrik.de

Loading the packages

In this section, I will be loading the following packages:

  • maptools

  • sf

  • raster

  • spatstat

  • tmap

  • tidyverse

  • arrow

  • lubridate

  • spNetwork

  • classInt

  • viridis

Show the code
pacman::p_load(maptools, sf, raster, spatstat, tmap, tidyverse, arrow, lubridate, spNetwork, classInt)

General Handling

Extraction of the Coastal Outline of Mainland Singapore

For the purposes of this analysis, we will also need to extract the Coastal Outline of Singapore. This is to create a boundary for all the points to fall under.

First, we need to read Singapore Master Plan Subzone 2014 to get Singapore’s shape overall.

Show the code
mpsz_sf <- st_read(dsn = "../../data/data",
                layer = "MP14_SUBZONE_WEB_PL")
Reading layer `MP14_SUBZONE_WEB_PL' from data source 
  `D:\KrisLBT\IS415-GAA\data\data' using driver `ESRI Shapefile'
Simple feature collection with 323 features and 15 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 2667.538 ymin: 15748.72 xmax: 56396.44 ymax: 50256.33
Projected CRS: SVY21

We have to check the CRS to ensure that it is in the correct coordinates system.

Show the code
st_crs(mpsz_sf)

Since it is in the wrong CRS (WGS84), we have to correct it to SVY21.

Show the code
mpsz_sf <- st_transform(mpsz_sf, 3414)

Now, we can plot the island.

Show the code
plot(mpsz_sf)

We notice that the planning subzone includes areas where Grab does not serve due to lack of bridges. Hence, any roads in unavailable areas would either not be useful or actually distort the network analysis.

Most of the areas unserved by Grab are the outer islands. For this, we can identify which of the outer islands to exclude from our analysis.

Show the code
outer_islands<-  filter(mpsz_sf, grepl('ISLAND', PLN_AREA_N))
plot(outer_islands["SUBZONE_N"])

However, we realise that not all the islands are unavailable to Grab drivers. Grab drivers are able to pick up and drop off at Sentosa.

Since 2019, Grab drivers have been allowed entry to Jurong Island, provided they have a security pass. However, given that very few drivers are allowed entry and the fact that this region is generally inaccessible to the public, we will be excluding them.

Hence, the islands we need to exclude are:

  • Semakau
  • Southern Group
  • Sudong
  • North-eastern Islands
  • Jurong Island and Bukom
Show the code
serviced_area <- mpsz_sf %>%
  filter(!grepl("SEMAKAU|SOUTHERN GROUP|SUDONG|NORTH-EASTERN ISLANDS|JURONG ISLAND AND BUKOM", SUBZONE_N)) 
plot(serviced_area)

Now, we can blend away the boundaries.

Show the code
sg_sf <- serviced_area %>%
  st_union()

Plot the sg_sf and we now see we have the CoastalOutline of Singapore that is served by Grab.

Show the code
plot(sg_sf)

We can save it as the CoastalOutline as a shapefile and read it again

Show the code
# writing the CoastalOutline
st_write(sg_sf, 'data/CoastalOutline.shp')
Show the code
# reading the CoastalOutline
sg_sf <- st_read(dsn='data/',
                 layer = "CoastalOutline")
Reading layer `CoastalOutline' from data source 
  `D:\KrisLBT\IS415-GAA\Take_Home_Exercises\Take_Home_Exercise_1\data' 
  using driver `ESRI Shapefile'
Simple feature collection with 1 feature and 1 field
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 2667.538 ymin: 21494.3 xmax: 55941.94 ymax: 50256.33
Projected CRS: SVY21 / Singapore TM

Handling of the Grab Posisi data

We will also have to handle the Grab Posisi data.

The data is quite large. As such, Grab has divided into 10 parquet files for easier distribution.

Reading the Grab Posisi Data

I will now be loading the Grab Posisi data

Show the code
df <- read_parquet(file="../../data/data/GrabPosisi/part-00000.snappy.parquet")
df1 <- read_parquet(file="../../data/data/GrabPosisi/part-00001.snappy.parquet")
df2 <- read_parquet(file="../../data/data/GrabPosisi/part-00002.snappy.parquet")
df3 <- read_parquet(file="../../data/data/GrabPosisi/part-00003.snappy.parquet")
df4 <- read_parquet(file="../../data/data/GrabPosisi/part-00004.snappy.parquet")
df5 <- read_parquet(file="../../data/data/GrabPosisi/part-00005.snappy.parquet")
df6 <- read_parquet(file="../../data/data/GrabPosisi/part-00006.snappy.parquet")
df7 <- read_parquet(file="../../data/data/GrabPosisi/part-00007.snappy.parquet")
df8 <- read_parquet(file="../../data/data/GrabPosisi/part-00008.snappy.parquet")
df9 <- read_parquet(file="../../data/data/GrabPosisi/part-00009.snappy.parquet")

Handling the Grab Posisi data

Time stamp

Upon investigating the file, I noticed that there was supposed to be a datetime stamp but it wasn’t in that format. As such, the data type needs to be changed.

Show the code
df$pingtimestamp <- as_datetime(df$pingtimestamp)
df1$pingtimestamp <- as_datetime(df1$pingtimestamp)
df2$pingtimestamp <- as_datetime(df2$pingtimestamp)
df3$pingtimestamp <- as_datetime(df3$pingtimestamp)
df4$pingtimestamp <- as_datetime(df4$pingtimestamp)
df5$pingtimestamp <- as_datetime(df5$pingtimestamp)
df6$pingtimestamp <- as_datetime(df6$pingtimestamp)
df7$pingtimestamp <- as_datetime(df7$pingtimestamp)
df8$pingtimestamp <- as_datetime(df8$pingtimestamp)
df9$pingtimestamp <- as_datetime(df9$pingtimestamp)

Merging them into 1 data frame

I also noticed that the the same ride might actually have origin and destination points through multiple data frames. As such, we need to merge them before we can extract the origin points.

Show the code
origin_df_all <- df %>%
  rbind(df1) %>%
  rbind(df2) %>%
  rbind(df3) %>%
  rbind(df4) %>%
  rbind(df5) %>%
  rbind(df6) %>%
  rbind(df7) %>%
  rbind(df8) %>%
  rbind(df9)

Extraction of Origin

Now, I will be extracting the origin.

The way I did it below is by performing the following steps

  • arrange it according to the trj_id (unique id determining where the person wants to at that point)
  • arranging it according to the time stamp (beginning and end)
  • getting the first row - mutating the data to weekday, start hour and the day
Show the code
origin_df <- origin_df_all %>%
  group_by(trj_id) %>%
  arrange(pingtimestamp) %>%
  filter(row_number()==1) %>%
  mutate(weekday = wday(pingtimestamp,
                        label=TRUE,
                        abbr=TRUE),
         start_hr= factor(hour(pingtimestamp)),
         day = factor(mday(pingtimestamp)))
Writing of Origin Data to RDS

Now, I will write the origin data to rds for easier future handling.

Show the code
write_rds(origin_df, "data/rds/origin_dfs.rds")
Reading of RDS data

Now, we can clear the data in our environment and just read the rds data we saved.

Show the code
origin_df <- read_rds("data/rds/origin_dfs.rds")

Exploratory Data Analysis and Handling of Origins

In this section, I will be exploring the general distribution of the origin points and handling it for the subsequent sections, which will analyse the data in greater detail.

Visualising the Frequency Distribution

First, I will plot out the general distribution of the origin points

Show the code
ggplot(data=origin_df,
       aes(x=weekday)) +
  geom_bar()

As can be seen, the number of trips daily seem to be fairly equally distributed throughout the week.

Visualising the Origins as a Point Map Symbol

We can visualise the geospatial distribution of the origin points. First, we need to convert the coordinates to lat-long and transform it to the correct CRS.

Show the code
origins_sf <- st_as_sf(origin_df, coords = c("rawlng", "rawlat"),
                       crs=4326) %>%
  st_transform(crs=3414)

Now, we can visualise it using the code chunk below:

Show the code
tm_shape(mpsz_sf)+
  tm_polygons()+
tm_shape(origins_sf) +
  tm_dots()

Handling

For further analysis, we will need to convert the various data into different formats. These will be explained and performed in the code chunk below

Show the code
#conversion of sg_sf, mpsz_sf and origins_sf to a generic spatial class
sg <- as_Spatial(sg_sf)
mpsz <- as_Spatial(mpsz_sf)
origins <- as_Spatial(origins_sf) 

# Conversion of sg and origins to SpatialPolygon format
sg_sp <- as(sg, "SpatialPolygons")
origins_sp <- as(origins, "SpatialPoints")

# conversion of origins to ppp format
origins_ppp <- as(origins_sp, "ppp")
origins_ppp
Planar point pattern: 28000 points
window: rectangle = [3628.24, 49845.23] x [25198.14, 49689.64] units

Now, I will plot origins_ppp.

Show the code
plot(origins_ppp)

We must check for duplicates in the data using the code chunk below:

Show the code
any(duplicated(origins_ppp))
[1] FALSE

As there are no duplicates within the data, we do not have to apply any more transformation to the data.

Create Owin Data

Now, we can convert the Coastal Outline to owin data and plot it out.

Show the code
sg_owin <- as(sg_sp, "owin")
plot(sg_owin)

Combining Events

We will now extract Grab origins within the Singapore CoastalOutline

Show the code
origins_SG_ppp = origins_ppp[sg_owin]

# plotting the distribution of the origins
plot(origins_SG_ppp)

Traditional Kernel Density Estimation Analysis

Now we can begin the traditional kernel density estimation analysis. As mentioned in Project Overview, this analysis will enable us to find areas where there is a greater density of origin points. As such, we can find where Grab users tend to book their Grab trips.

Country Level Analysis

We can begin by trying to find all of the best kernel and bandwidths for the purposes of our analysis. To do this, we need to find the one with the tightest clusters.

In R, we have 4 possible kernel density bandwidths: diggle, ppl, scott and CvL. All of them will be tested below:

Show the code
# diggle
kde_origins_SG.bw <- density(origins_SG_ppp,
                              sigma=bw.diggle,
                              edge=TRUE,
                            kernel="gaussian") 
# ppl
kde_origins_SG.ppl <- density(origins_SG_ppp, 
                               sigma=bw.ppl, 
                               edge=TRUE,
                               kernel="gaussian")
# CvL
kde_origins_SG.CvL <- density(origins_SG_ppp, 
                               sigma=bw.CvL, 
                               edge=TRUE,
                               kernel="gaussian")
# scott
kde_origins_SG.scott <- density(origins_SG_ppp, 
                               sigma=bw.scott, 
                               edge=TRUE,
                               kernel="gaussian")

We may also plot them out:

Show the code
par(mfrow=c(2,2))
plot(kde_origins_SG.bw)
plot(kde_origins_SG.ppl)
plot(kde_origins_SG.CvL)
plot(kde_origins_SG.scott)

Now, we can check for the tightness of these clusters:

Show the code
# tightness of diggle
bw.diggle(origins_SG_ppp)

# tightness of ppl
bw.ppl(origins_SG_ppp)

# tightness of CvL
bw.CvL(origins_SG_ppp)

# tightness of scott
bw.scott(origins_SG_ppp)

We can observe that the smallest sigma value is from diggle, suggesting that it has the tightest clusters of all the bandwidth. Hence, we will continue using diggle for this analysis.

We can also test for different kernels:

Show the code
par(mfrow=c(2,2))
plot(density(origins_SG_ppp, 
             sigma=bw.diggle, 
             edge=TRUE, 
             kernel="gaussian"), 
     main="Gaussian")
plot(density(origins_SG_ppp, 
             sigma=bw.diggle, 
             edge=TRUE, 
             kernel="epanechnikov"), 
     main="Epanechnikov")
plot(density(origins_SG_ppp, 
             sigma=bw.diggle, 
             edge=TRUE, 
             kernel="quartic"), 
     main="Quartic")
plot(density(origins_SG_ppp, 
             sigma=bw.diggle, 
             edge=TRUE, 
             kernel="disc"), 
     main="Disc")

As they are identical, there is no need to choose a specific one. Moving forward, I will be using Gaussian kernel.

Regions of Interest

We can tighten our analysis to include only areas of interest. From the figures above, we observe higher densities in the planning areas Changi, Jurong East, Woodlands and Marine Parade, we can filter out mpsz to get them.

Show the code
# extraction of Changi Airport
CA = mpsz_sf[mpsz_sf$PLN_AREA_N=="CHANGI",]

# extraction of Jurong East
JE = mpsz_sf[mpsz_sf$PLN_AREA_N=="JURONG EAST",]

# extraction of Woodlands
WL = mpsz_sf[mpsz_sf$PLN_AREA_N=="WOODLANDS",]

# extraction of Marine Parade
MP = mpsz_sf[mpsz_sf$PLN_AREA_N=='MARINE PARADE',]

With this, we now have to perform the same functions we did in Create Owin Data and Conversions.

Show the code
# turn them into spatial
CA_spatial <- as_Spatial(CA)
JE_spatial <- as_Spatial(JE)
WL_spatial <- as_Spatial(WL)
MP_spatial <- as_Spatial(MP)

# turn them into SpatialPolygons
CA_sp <- as(CA_spatial, "SpatialPolygons")
JE_sp <- as(JE_spatial, "SpatialPolygons")
WL_sp <- as(WL_spatial, "SpatialPolygons")
MP_sp <- as(MP_spatial, "SpatialPolygons")

# convert to owin

CA_owin = as(CA_sp, "owin")
JE_owin = as(JE_sp, "owin")
WL_owin = as(WL_sp, "owin")
MP_owin = as(MP_sp, "owin")

From here, we can extract the events that occurred in these planning areas.

Show the code
origins_CA_ppp = origins_SG_ppp[CA_owin]
origins_JE_ppp = origins_SG_ppp[JE_owin]
origins_WL_ppp = origins_SG_ppp[WL_owin]
origins_MP_ppp = origins_SG_ppp[MP_owin]

We can now perform the same analysis that we did in Country Level Analysis:

Show the code
kde_origins_CA <- density(origins_CA_ppp,
                              sigma=bw.diggle,
                              edge=TRUE,
                            kernel="gaussian") 
kde_origins_JE <- density(origins_JE_ppp,
                              sigma=bw.diggle,
                              edge=TRUE,
                            kernel="gaussian") 
kde_origins_WL <- density(origins_WL_ppp,
                              sigma=bw.diggle,
                              edge=TRUE,
                            kernel="gaussian") 
kde_origins_MP <- density(origins_MP_ppp,
                              sigma=bw.diggle,
                              edge=TRUE,
                            kernel="gaussian") 

We can also plot them out:

Show the code
par(mfrow=c(2,2))
plot(kde_origins_CA,
     main="Changi")
plot(kde_origins_WL,
     main="Woodlands")
plot(kde_origins_JE,
     main="Jurong East")
plot(kde_origins_MP,
     main="Marine Parade")

Clark and Evans test

The Clark and Evans test allows us to understand if the data points are clustered in any places or not.

Country Level Analysis

In the below code chunk, we are conducting this test on the origin points in the whole of Singapore. The hypothesis are as follows

H0: The Grab origin points are distributed equally throughout the country

H1: The Grab origin points have clusters.

Show the code
clarkevans.test(origins_SG_ppp,
                correction="none",
                clipregion="sg_owin",
                alternative=c("clustered"),
                nsim=99)

    Clark-Evans test
    No edge correction
    Z-test

data:  origins_SG_ppp
R = 0.28039, p-value < 2.2e-16
alternative hypothesis: clustered (R < 1)

From this, we can see that there is clustering (R=0.28039) at a p-value of 2.2e-16. As 2.2e-16 is smaller than 0.05, we can conclude that this difference is significant at a 5% significance level. Hence, we reject the null hypothesis and the Grab origin points have clusters.

Regions of Interest

Changi

Show the code
clarkevans.test(origins_CA_ppp,
                correction="none",
                clipregion=NULL,
                alternative=c("two.sided"),
                nsim=99)

    Clark-Evans test
    No edge correction
    Z-test

data:  origins_CA_ppp
R = 0.13547, p-value < 2.2e-16
alternative hypothesis: two-sided

From this, we can see that there is clustering (R= 0.31778) at a p-value of 2.2e-16. As 2.2e-16 is smaller than 0.05, we can conclude that this difference is significant at a 5% significance level. Hence, we reject the null hypothesis and the Grab origin points have clusters.

Woodlands

Show the code
clarkevans.test(origins_WL_ppp,
                correction="none",
                clipregion=NULL,
                alternative=c("two.sided"),
                nsim=99)

    Clark-Evans test
    No edge correction
    Z-test

data:  origins_WL_ppp
R = 0.31778, p-value < 2.2e-16
alternative hypothesis: two-sided

From this, we can see that there is clustering (R= 0.25797) at a p-value of 2.2e-16. As 2.2e-16 is smaller than 0.05, we can conclude that this difference is significant at a 5% significance level. Hence, we reject the null hypothesis and the Grab origin points have clusters.

Jurong East

Show the code
clarkevans.test(origins_JE_ppp,
                correction="none",
                clipregion=NULL,
                alternative=c("two.sided"),
                nsim=99)

    Clark-Evans test
    No edge correction
    Z-test

data:  origins_JE_ppp
R = 0.25797, p-value < 2.2e-16
alternative hypothesis: two-sided

From this, we can see that there is clustering (R= 0.25797) at a p-value of 2.2e-16. As 2.2e-16 is smaller than 0.05, we can conclude that this difference is significant at a 5% significance level. Hence, we reject the null hypothesis and the Grab origin points have clusters.

Marine Parade

Show the code
clarkevans.test(origins_MP_ppp,
                correction="none",
                clipregion=NULL,
                alternative=c("two.sided"),
                nsim=99)

    Clark-Evans test
    No edge correction
    Z-test

data:  origins_MP_ppp
R = 0.51201, p-value < 2.2e-16
alternative hypothesis: two-sided

From this, we can see that there is clustering (R= 0.51201) at a p-value of 2.2e-16. As 2.2e-16 is smaller than 0.05, we can conclude that this difference is significant at a 5% significance level. Hence, we reject the null hypothesis and the Grab origin points have clusters.

Network Kernel Density Estimation

While the above does show the distribution of the points and which areas are more concentrated than others, they do not account for the network of the roads. In this section, we will be finding the networks (roads) that are more densely used as origins points in Changi Airport and Marina East.

Data Handling

Handling of Road Data

Now, we can read the road data as provided by openstreetmap. Note that it provides data for Malaysia, Singapore and Brunei all at once.

Show the code
all_roads <- st_read(dsn='../../data/data/data',
                     layer = 'gis_osm_roads_free_1')

We realise that it is in WGS 84, not in SVY21 so we have to transform the data.

Show the code
all_roads <- st_transform(all_roads, 3414)

We can check the CRS again for all_roads.

Show the code
st_crs(all_roads)

Since we’re only interested in the roads in mainland Singapore, we have to filter the roads.

Show the code
# get all roads in mainland Singapore
SG_roads <- st_intersection(all_roads, sg_sf)

# write out the roads into a separate .shp file for easier future handling
st_write(SG_roads, 'data/SG_roads.shp')

We can now read it from here.

Show the code
SG_roads <- st_read(dsn='data', layer= 'SG_roads')
Reading layer `SG_roads' from data source 
  `D:\KrisLBT\IS415-GAA\Take_Home_Exercises\Take_Home_Exercise_1\data' 
  using driver `ESRI Shapefile'
Simple feature collection with 227969 features and 10 fields
Geometry type: MULTILINESTRING
Dimension:     XY
Bounding box:  xmin: 2679.373 ymin: 23099.51 xmax: 50957.8 ymax: 50220.06
Projected CRS: SVY21 / Singapore TM

Getting the Roads within the Subzones

Now, we can find the streets that exist only inside the subzones.

Show the code
# get roads only in the aforementioned subzones
CA_roads <- st_intersection(SG_roads, CA)
WL_roads <- st_intersection(SG_roads, WL)
JE_roads <- st_intersection(SG_roads, JE)
MP_roads <- st_intersection(SG_roads, MP)
Show the code
CA_roads
Simple feature collection with 4007 features and 25 fields
Geometry type: GEOMETRY
Dimension:     XY
Bounding box:  xmin: 42577.65 ymin: 32548.05 xmax: 50286.8 ymax: 41649.2
Projected CRS: SVY21 / Singapore TM
First 10 features:
       osm_id code      fclass                  name  ref oneway maxspeed layer
356  22617051 5113     primary         Loyang Avenue <NA>      F       70     0
360  22617159 5113     primary         Loyang Avenue <NA>      F       70     0
1348 22981700 5113     primary       Telok Paku Road <NA>      F       50     0
3201 34403286 5114   secondary            Loyang Way <NA>      F       50     0
3207 34403387 5122 residential Changi North Street 1 <NA>      B        0     0
3208 34403405 5122 residential Changi North Crescent <NA>      B       50     0
3209 34403417 5122 residential     Changi North Rise <NA>      B        0     0
4001 42063713 5141     service                  <NA> <NA>      F        0     0
4002 42063715 5141     service                  <NA> <NA>      B        0     0
4003 42063716 5141     service                  <NA> <NA>      B        0     0
     bridge tunnel OBJECTID SUBZONE_NO   SUBZONE_N SUBZONE_C CA_IND PLN_AREA_N
356       F      F      221          2 CHANGI WEST    CHSZ02      N     CHANGI
360       F      F      221          2 CHANGI WEST    CHSZ02      N     CHANGI
1348      F      F      221          2 CHANGI WEST    CHSZ02      N     CHANGI
3201      F      F      221          2 CHANGI WEST    CHSZ02      N     CHANGI
3207      F      F      221          2 CHANGI WEST    CHSZ02      N     CHANGI
3208      F      F      221          2 CHANGI WEST    CHSZ02      N     CHANGI
3209      F      F      221          2 CHANGI WEST    CHSZ02      N     CHANGI
4001      F      F      221          2 CHANGI WEST    CHSZ02      N     CHANGI
4002      F      F      221          2 CHANGI WEST    CHSZ02      N     CHANGI
4003      F      F      221          2 CHANGI WEST    CHSZ02      N     CHANGI
     PLN_AREA_C    REGION_N REGION_C          INC_CRC FMEL_UPD_D   X_ADDR
356          CH EAST REGION       ER 7460F2CFB1D7D36C 2014-12-05 44092.34
360          CH EAST REGION       ER 7460F2CFB1D7D36C 2014-12-05 44092.34
1348         CH EAST REGION       ER 7460F2CFB1D7D36C 2014-12-05 44092.34
3201         CH EAST REGION       ER 7460F2CFB1D7D36C 2014-12-05 44092.34
3207         CH EAST REGION       ER 7460F2CFB1D7D36C 2014-12-05 44092.34
3208         CH EAST REGION       ER 7460F2CFB1D7D36C 2014-12-05 44092.34
3209         CH EAST REGION       ER 7460F2CFB1D7D36C 2014-12-05 44092.34
4001         CH EAST REGION       ER 7460F2CFB1D7D36C 2014-12-05 44092.34
4002         CH EAST REGION       ER 7460F2CFB1D7D36C 2014-12-05 44092.34
4003         CH EAST REGION       ER 7460F2CFB1D7D36C 2014-12-05 44092.34
       Y_ADDR SHAPE_Leng SHAPE_Area                       geometry
356  38928.66   14918.11    4848517 LINESTRING (44617.88 41088....
360  38928.66   14918.11    4848517 LINESTRING (43851.2 39632.9...
1348 38928.66   14918.11    4848517 LINESTRING (45352.67 41113....
3201 38928.66   14918.11    4848517 LINESTRING (43761.28 39560....
3207 38928.66   14918.11    4848517 LINESTRING (43152.23 36906....
3208 38928.66   14918.11    4848517 LINESTRING (42843.89 36842....
3209 38928.66   14918.11    4848517 LINESTRING (43130.89 36812....
4001 38928.66   14918.11    4848517 LINESTRING (43433.23 37663....
4002 38928.66   14918.11    4848517 LINESTRING (43436.11 37657....
4003 38928.66   14918.11    4848517 LINESTRING (43511.5 37906.7...
Show the code
WL_roads
Simple feature collection with 7165 features and 25 fields
Geometry type: GEOMETRY
Dimension:     XY
Bounding box:  xmin: 20613.4 ymin: 44814.83 xmax: 25593.19 ymax: 49200.77
Projected CRS: SVY21 / Singapore TM
First 10 features:
       osm_id code        fclass                name  ref oneway maxspeed layer
796  22773625 5115      tertiary  Woodlands Avenue 6 <NA>      F       40     0
819  22774351 5131 motorway_link                <NA> <NA>      F       50     0
825  22774532 5122   residential  Woodlands Drive 14 <NA>      F       40     0
826  22774537 5122   residential  Woodlands Drive 53 <NA>      B       50     0
827  22774541 5122   residential  Woodlands Drive 43 <NA>      F       40     0
832  22775173 5115      tertiary  Woodlands Avenue 5 <NA>      F       60     0
834  22775385 5113       primary Woodlands Avenue 12 <NA>      F       70     0
835  22775386 5114     secondary  Woodlands Avenue 1 <NA>      F       50     0
836  22775389 5114     secondary  Woodlands Avenue 1 <NA>      F       50     0
3703 37584630 5111      motorway  Seletar Expressway  SLE      F       90     1
     bridge tunnel OBJECTID SUBZONE_NO       SUBZONE_N SUBZONE_C CA_IND
796       F      F      282          4 WOODLANDS SOUTH    WDSZ04      N
819       F      F      282          4 WOODLANDS SOUTH    WDSZ04      N
825       F      F      282          4 WOODLANDS SOUTH    WDSZ04      N
826       F      F      282          4 WOODLANDS SOUTH    WDSZ04      N
827       F      F      282          4 WOODLANDS SOUTH    WDSZ04      N
832       F      F      282          4 WOODLANDS SOUTH    WDSZ04      N
834       F      F      282          4 WOODLANDS SOUTH    WDSZ04      N
835       F      F      282          4 WOODLANDS SOUTH    WDSZ04      N
836       F      F      282          4 WOODLANDS SOUTH    WDSZ04      N
3703      T      F      282          4 WOODLANDS SOUTH    WDSZ04      N
     PLN_AREA_N PLN_AREA_C     REGION_N REGION_C          INC_CRC FMEL_UPD_D
796   WOODLANDS         WD NORTH REGION       NR 8A4E14DAC4ACE11C 2014-12-05
819   WOODLANDS         WD NORTH REGION       NR 8A4E14DAC4ACE11C 2014-12-05
825   WOODLANDS         WD NORTH REGION       NR 8A4E14DAC4ACE11C 2014-12-05
826   WOODLANDS         WD NORTH REGION       NR 8A4E14DAC4ACE11C 2014-12-05
827   WOODLANDS         WD NORTH REGION       NR 8A4E14DAC4ACE11C 2014-12-05
832   WOODLANDS         WD NORTH REGION       NR 8A4E14DAC4ACE11C 2014-12-05
834   WOODLANDS         WD NORTH REGION       NR 8A4E14DAC4ACE11C 2014-12-05
835   WOODLANDS         WD NORTH REGION       NR 8A4E14DAC4ACE11C 2014-12-05
836   WOODLANDS         WD NORTH REGION       NR 8A4E14DAC4ACE11C 2014-12-05
3703  WOODLANDS         WD NORTH REGION       NR 8A4E14DAC4ACE11C 2014-12-05
       X_ADDR   Y_ADDR SHAPE_Leng SHAPE_Area                       geometry
796  23609.57 45692.88   5211.384    1576001 LINESTRING (24007.3 45538.1...
819  23609.57 45692.88   5211.384    1576001 LINESTRING (22821.47 45515....
825  23609.57 45692.88   5211.384    1576001 LINESTRING (23444.28 46022....
826  23609.57 45692.88   5211.384    1576001 LINESTRING (23990.45 46088....
827  23609.57 45692.88   5211.384    1576001 LINESTRING (23447.47 46014....
832  23609.57 45692.88   5211.384    1576001 LINESTRING (24645.58 46021....
834  23609.57 45692.88   5211.384    1576001 LINESTRING (23836.85 45038....
835  23609.57 45692.88   5211.384    1576001 LINESTRING (23124.88 45989....
836  23609.57 45692.88   5211.384    1576001 LINESTRING (24156.4 45407.0...
3703 23609.57 45692.88   5211.384    1576001 LINESTRING (23684.88 44959....
Show the code
JE_roads
Simple feature collection with 7131 features and 25 fields
Geometry type: GEOMETRY
Dimension:     XY
Bounding box:  xmin: 14254.68 ymin: 30994.22 xmax: 19398.25 ymax: 37289.81
Projected CRS: SVY21 / Singapore TM
First 10 features:
       osm_id code       fclass                     name  ref oneway maxspeed
1084 22903885 5114    secondary             Penjuru Road <NA>      F       60
1085 22903886 5121 unclassified             Penjuru Road <NA>      F       60
1443 23101520 5112        trunk              Jalan Buroh <NA>      F       70
3791 39959160 5112        trunk              Jalan Buroh <NA>      F       70
3792 39959161 5112        trunk              Jalan Buroh <NA>      F       70
5003 70583774 5115     tertiary              Pandan Road <NA>      B       50
5004 70583780 5121 unclassified             Penjuru Lane <NA>      B       50
5005 70583783 5114    secondary          Tanjong Penjuru <NA>      B       50
5006 70583786 5114    secondary          Tanjong Penjuru <NA>      F       50
5007 70583790 5121 unclassified Tanjong Penjuru Crescent <NA>      B       50
     layer bridge tunnel OBJECTID SUBZONE_NO        SUBZONE_N SUBZONE_C CA_IND
1084     0      F      F       80         10 PENJURU CRESCENT    JESZ10      N
1085     0      F      F       80         10 PENJURU CRESCENT    JESZ10      N
1443     1      T      F       80         10 PENJURU CRESCENT    JESZ10      N
3791     1      T      F       80         10 PENJURU CRESCENT    JESZ10      N
3792     0      F      F       80         10 PENJURU CRESCENT    JESZ10      N
5003     0      F      F       80         10 PENJURU CRESCENT    JESZ10      N
5004     0      F      F       80         10 PENJURU CRESCENT    JESZ10      N
5005     0      F      F       80         10 PENJURU CRESCENT    JESZ10      N
5006     0      F      F       80         10 PENJURU CRESCENT    JESZ10      N
5007     0      F      F       80         10 PENJURU CRESCENT    JESZ10      N
      PLN_AREA_N PLN_AREA_C    REGION_N REGION_C          INC_CRC FMEL_UPD_D
1084 JURONG EAST         JE WEST REGION       WR BC263278706376DE 2014-12-05
1085 JURONG EAST         JE WEST REGION       WR BC263278706376DE 2014-12-05
1443 JURONG EAST         JE WEST REGION       WR BC263278706376DE 2014-12-05
3791 JURONG EAST         JE WEST REGION       WR BC263278706376DE 2014-12-05
3792 JURONG EAST         JE WEST REGION       WR BC263278706376DE 2014-12-05
5003 JURONG EAST         JE WEST REGION       WR BC263278706376DE 2014-12-05
5004 JURONG EAST         JE WEST REGION       WR BC263278706376DE 2014-12-05
5005 JURONG EAST         JE WEST REGION       WR BC263278706376DE 2014-12-05
5006 JURONG EAST         JE WEST REGION       WR BC263278706376DE 2014-12-05
5007 JURONG EAST         JE WEST REGION       WR BC263278706376DE 2014-12-05
       X_ADDR   Y_ADDR SHAPE_Leng SHAPE_Area                       geometry
1084 17651.31 31799.61   9876.049    3049720 LINESTRING (16991.44 32621....
1085 17651.31 31799.61   9876.049    3049720 LINESTRING (16738.48 31589....
1443 17651.31 31799.61   9876.049    3049720 LINESTRING (16290.72 32753....
3791 17651.31 31799.61   9876.049    3049720 LINESTRING (19056.98 32186....
3792 17651.31 31799.61   9876.049    3049720 LINESTRING (18978.73 32113....
5003 17651.31 31799.61   9876.049    3049720 LINESTRING (18411.82 31559....
5004 17651.31 31799.61   9876.049    3049720 LINESTRING (17208.01 32320....
5005 17651.31 31799.61   9876.049    3049720 LINESTRING (16869.64 32131....
5006 17651.31 31799.61   9876.049    3049720 LINESTRING (17506.95 32049....
5007 17651.31 31799.61   9876.049    3049720 LINESTRING (17697.91 31764....
Show the code
MP_roads
Simple feature collection with 2864 features and 25 fields
Geometry type: GEOMETRY
Dimension:     XY
Bounding box:  xmin: 32697.12 ymin: 29763 xmax: 37589.95 ymax: 32843.37
Projected CRS: SVY21 / Singapore TM
First 10 features:
       osm_id code       fclass                         name  ref oneway
922  22810641 5114    secondary    Tanjong Katong Road South <NA>      F
1202 22930502 5113      primary            Marina East Drive <NA>      F
3760 39477475 5111     motorway           East Coast Parkway  ECP      F
3761 39477476 5111     motorway           East Coast Parkway  ECP      F
4078 44122367 5152     cycleway               Park Connector <NA>      B
4097 44133236 5153      footway      Underpass to Meyer Road <NA>      B
4140 44488255 5153      footway                  Katong Park <NA>      B
4141 44488257 5121 unclassified East Coast Park Service Road <NA>      B
4142 44488265 5114    secondary    Tanjong Katong Road South <NA>      F
5509 74729034 5111     motorway           East Coast Parkway  ECP      F
     maxspeed layer bridge tunnel OBJECTID SUBZONE_NO        SUBZONE_N
922        50     0      F      F       44          5 MARINA EAST (MP)
1202       60     0      F      F       44          5 MARINA EAST (MP)
3760       90     1      T      F       44          5 MARINA EAST (MP)
3761       80     0      F      F       44          5 MARINA EAST (MP)
4078        0     0      F      F       44          5 MARINA EAST (MP)
4097        0    -1      F      T       44          5 MARINA EAST (MP)
4140        0    -1      F      T       44          5 MARINA EAST (MP)
4141       50     0      F      F       44          5 MARINA EAST (MP)
4142       50     1      T      F       44          5 MARINA EAST (MP)
5509       90     0      F      F       44          5 MARINA EAST (MP)
     SUBZONE_C CA_IND    PLN_AREA_N PLN_AREA_C       REGION_N REGION_C
922     MPSZ05      N MARINE PARADE         MP CENTRAL REGION       CR
1202    MPSZ05      N MARINE PARADE         MP CENTRAL REGION       CR
3760    MPSZ05      N MARINE PARADE         MP CENTRAL REGION       CR
3761    MPSZ05      N MARINE PARADE         MP CENTRAL REGION       CR
4078    MPSZ05      N MARINE PARADE         MP CENTRAL REGION       CR
4097    MPSZ05      N MARINE PARADE         MP CENTRAL REGION       CR
4140    MPSZ05      N MARINE PARADE         MP CENTRAL REGION       CR
4141    MPSZ05      N MARINE PARADE         MP CENTRAL REGION       CR
4142    MPSZ05      N MARINE PARADE         MP CENTRAL REGION       CR
5509    MPSZ05      N MARINE PARADE         MP CENTRAL REGION       CR
              INC_CRC FMEL_UPD_D  X_ADDR   Y_ADDR SHAPE_Leng SHAPE_Area
922  1575868DF0D30E32 2014-12-05 33715.7 30512.25   6657.151    1590340
1202 1575868DF0D30E32 2014-12-05 33715.7 30512.25   6657.151    1590340
3760 1575868DF0D30E32 2014-12-05 33715.7 30512.25   6657.151    1590340
3761 1575868DF0D30E32 2014-12-05 33715.7 30512.25   6657.151    1590340
4078 1575868DF0D30E32 2014-12-05 33715.7 30512.25   6657.151    1590340
4097 1575868DF0D30E32 2014-12-05 33715.7 30512.25   6657.151    1590340
4140 1575868DF0D30E32 2014-12-05 33715.7 30512.25   6657.151    1590340
4141 1575868DF0D30E32 2014-12-05 33715.7 30512.25   6657.151    1590340
4142 1575868DF0D30E32 2014-12-05 33715.7 30512.25   6657.151    1590340
5509 1575868DF0D30E32 2014-12-05 33715.7 30512.25   6657.151    1590340
                           geometry
922  LINESTRING (35230.6 31057.3...
1202 LINESTRING (33760.43 30638....
3760 LINESTRING (33860.36 30899....
3761 LINESTRING (33706.71 30894....
4078 LINESTRING (34789.2 30864.6...
4097 LINESTRING (34772.07 30914....
4140 LINESTRING (34005.49 30906....
4141 LINESTRING (35255.73 30985,...
4142 LINESTRING (35227.24 31091....
5509 LINESTRING (34854.12 30932....

We notice that all of the above are geometry type geometry and not a linestring. We can convert them with the following code chunk:

Show the code
CA_roads <- CA_roads %>%
  st_cast("LINESTRING")

WL_roads <- WL_roads %>%
  st_cast("LINESTRING")

JE_roads <- JE_roads %>%
  st_cast("LINESTRING")

MP_roads <- MP_roads %>%
  st_cast("LINESTRING")

Extraction of the Events Within the Subzones

Before we can conduct analysis on the network, we will also need to constrict the events to exclusively the ones that occurred within these two subzones.

Show the code
#extraction of origin events that occurred within Changi and Marine Parade
CA_origins <- st_intersection(origins_sf, CA)
WL_origins <- st_intersection(origins_sf, WL)
JE_origins <- st_intersection(origins_sf, JE)
MP_origins <- st_intersection(origins_sf, MP)

We can now plot this data. In this instance, we can use Changi as an example:

Show the code
tmap_mode('view')
tm_shape(CA_origins)+
  tm_dots() +
  tm_shape(CA_roads) +
  tm_lines()
Show the code
tmap_mode('plot')

Network Constrained KDE (NetKDE) Analysis

Preparing the lixel objects

Before computing NetKDE, the SpatialLines object need to be cut into lixels with a specified minimal distance. This task can be performed by using with lixelize_lines() of spNetwork as shown in the code chunk below.

Show the code
CA_lixels <- lixelize_lines(CA_roads, 
                         750, 
                         mindist = 375)
WL_lixels <- lixelize_lines(WL_roads, 
                         750, 
                         mindist = 375)
JE_lixels <- lixelize_lines(JE_roads, 
                         750, 
                         mindist = 375)
MP_lixels <- lixelize_lines(MP_roads, 
                         750, 
                         mindist = 375)

Generating Line Points

Next, lines_center() of spNetwork will be used to generate a SpatialPointsDataFrame (i.e. samples) with line centre points as shown in the code chunk below.

Show the code
CA_samples <- lines_center(CA_lixels)
WL_samples <- lines_center(WL_lixels)
JE_samples <- lines_center(JE_lixels)
MP_samples <- lines_center(MP_lixels)

Performing NetKDE

We can now compute the NetKDE using the code chunk below:

Show the code
# for the purposes of this analysis, the bandwidth will be set to 8.080901, which is the diggle bandwidth for Singapore overall
CA_densities <- nkde(CA_roads, 
                  events = CA_origins,
                  w = rep(1,nrow(CA_origins)),
                  samples = CA_samples,
                  kernel_name = "quartic",
                  bw = 8.080901, 
                  div= "bw", 
                  method = "simple", 
                  digits = 1, 
                  tol = 1,
                  grid_shape = c(1,1), 
                  max_depth = 8,
                  agg = 10, #we aggregate events within a 10m radius (faster calculation)
                  sparse = TRUE,
                  verbose = FALSE)
# care about bw (bandwith) and kernel_name 

JE_densities <- nkde(JE_roads, 
                  events = JE_origins,
                  w = rep(1,nrow(JE_origins)),
                  samples = JE_samples,
                  kernel_name = "quartic",
                  bw = 8.080901, 
                  div= "bw", 
                  method = "simple", 
                  digits = 1, 
                  tol = 1,
                  grid_shape = c(1,1), 
                  max_depth = 8,
                  agg = 10, #we aggregate events within a 10m radius (faster calculation)
                  sparse = TRUE,
                  verbose = FALSE)
 # care about bw (bandwith) and kernel_name 

WL_densities <- nkde(WL_roads, 
                  events = WL_origins,
                  w = rep(1,nrow(WL_origins)),
                  samples = WL_samples,
                  kernel_name = "quartic",
                  bw = 8.080901, 
                  div= "bw", 
                  method = "simple", 
                  digits = 1, 
                  tol = 1,
                  grid_shape = c(1,1), 
                  max_depth = 8,
                  agg = 10, #we aggregate events within a 10m radius (faster calculation)
                  sparse = TRUE,
                  verbose = FALSE)
# care about bw (bandwith) and kernel_name 

MP_densities <- nkde(MP_roads, 
                  events = MP_origins,
                  w = rep(1,nrow(MP_origins)),
                  samples = MP_samples,
                  kernel_name = "quartic",
                  bw = 8.080901, 
                  div= "bw", 
                  method = "simple", 
                  digits = 1, 
                  tol = 1,
                  grid_shape = c(1,1), 
                  max_depth = 8,
                  agg = 10, #we aggregate events within a 10m radius (faster calculation)
                  sparse = TRUE,
                  verbose = FALSE)
# care about bw (bandwith) and kernel_name 

Visualising NetKDE

Before we can visualise the NetKDE values, code chunk below will be used to insert the computed density values (i.e. densities) into samples and lixels objects as density field.

Show the code
CA_samples$density <- CA_densities
CA_lixels$density <- CA_densities

JE_samples$density <- JE_densities
JE_lixels$density <- JE_densities

MP_samples$density <- MP_densities
MP_lixels$density <- MP_densities

WL_samples$density <- WL_densities
WL_lixels$density <- WL_densities

We can also rescale to make the values more understandable. Since the values are in kilometres, multiplying the values by 1000 will convert them to metres.

We can use the code chunk below to visualise the network kernel density estimation:

Visualising NetKDE for Changi

Show the code
tmap_mode('view')
tm_shape(CA_lixels)+
  tm_lines(col="density")
Show the code
tmap_mode('plot')

Visualising NetKDE for Jurong East

Show the code
tmap_mode('view')
tm_shape(JE_lixels)+
  tm_lines(col="density")
Show the code
tmap_mode('plot')

Visualising NetKDE for Marine Parade

Show the code
tmap_mode('view')
tm_shape(MP_lixels)+
  tm_lines(col="density")
Show the code
tmap_mode('plot')

Visualising NetKDE for Woodlands

Show the code
tmap_mode('view')
tm_shape(WL_lixels)+
  tm_lines(col="density")
Show the code
tmap_mode('plot')

Conclusion

From the above, we can conclude that there is visible clustering where people book Grab rides in Singapore. This could be attributed to the public transportation system being comprehensive and comparatively inexpensive, which disinclines commuters from opting for Grab taxis. Nonetheless, this analysis is inconclusive as it’s only the total number of rides for a period of a few. Hence, it may not be representative of population behaviour over time.